Application of SARIMA model in forecasting and analyzing inpatient cases of acute mountain sickness

Background Acute Mountain Sickness (AMS) is typically triggered by hypoxia under high altitude conditions. Currently, rule of time among AMS inpatients was not clear. Thus, this study aimed to analyze the time distribution of AMS inpatients in the past ten years and construct a prediction model of AMS hospitalized cases. Methods We retrospectively collected medical records of AMS inpatients admitted to the military hospitals from January 2009 to December 2018 and analyzed the time series characteristics. Seasonal Auto-Regressive Integrated Moving Average (SARIMA) was established through training data to finally forecast in the test data set. Results A total of 22 663 inpatients were included in this study and recorded monthly, with predominant peak annually, early spring (March) and mid-to-late summer (July to August), respectively. Using the training data from January 2009 to December 2017, the model SARIMA (1, 1, 1) (1, 0, 1) 12 was employed to predict the test data from January 2018 to December 2018. In 2018, the total predicted value after adjustment was 9.24%, less than the actual value. Conclusion AMS inpatients have obvious periodicity and seasonality. The SARIMA model has good fitting ability and high short-term prediction accuracy. It can help explore the characteristics of AMS disease and provide decision-making basis for allocation of relevant medical resources for AMS inpatients.


Introduction
Acute mountain sickness (AMS) occurs when people arrive in an area 2500 m above sea level, and fail to adapt to high altitude in physiological aspects. It is characterized by headache, insomnia, dizziness, fatigue, and gastrointestinal reactions (anorexia, nausea or vomiting) and other symptoms [1,2]. The plateaus in China are mainly distributed in regions including Xinjiang, Tibet, Qinghai, etc., as well as parts of high lands of Yunnan, Gansu, and Sichuan. The plateau has a complex geographic environment, with high altitude, low oxygen partial pressure, cold climate, heavy and dry winds, and frequent natural disasters [3]. Construction workers, residents, and disaster reliefs and poverty alleviation teams in Tibet all have reported various degrees of AMS. Among the builders of the Qinghai-Tibet Railway, the incidence of AMS due to the first high altitude exposure was more than 50% [4]. After entering Tibet from the plains, 146 out of 640 passengers presented with AMS at an incidence rate of about 23% [5]. After the "4·14" earthquake in Yushu, Qinghai, 193 officers and workers flew to the plateau for rescue and relief. Among them, 154 cases reported AMS, with a high incidence rate close to 80% because of relatively heavy physical exertion [6].
In the past 10 years, a large number of studies focused on the incidence of AMS after high-altitude exposure under various conditions, but the admission of AMS inpatients remained unclear. This study aimed to analyze and predict the time series of AMS inpatients admitted to military hospitals in plateau areas in the past 10 years by means of the Seasonal Auto-Regressive Integrated Moving Average (SARIMA). The trend may provide a reference for allocation of related medical resources used for treatment of the diseases.

Data collection
The medical records were collected, with information related to AMS inpatients admitted to military hospital from January 2009 to December 2018. The hospitals in plateaus served both military personnel and civilians. The AMS standardized diagnosis (T70.2) under the International Disease Classification System ICD-10 were used as the retrieval basis. The train set and test set constituted 95.18%, 4.82% of the total data, respectively. The present study was performed with historical de-identified data; thus, it was exempt from Institutional Review Board approval.

Model establishment
The statistical modeling of AMS inpatients was analyzed by Statistical Product Service Solutions (SPSS) 22.0, specifically following three steps: variance analysis, model building and predictive analysis. Variance analysis incorporated normality test, overall comparison analysis between groups, and pairwise comparison analysis. Model building focused on the stationarity discrimination, model selection, white noise test, and parameter training. Predictive analysis was carried out to calculate monthly admissions based on the prediction model.

Variance analysis
A frequency distribution histogram and QQ diagram of monthly AMS admissions were charted. The normality was evaluated by Shapiro-Wilk (S-W) test. The differences of monthly admissions were tested using one-way ANOVA. The Levene's statistic test was utilized for assumption of homogeneity of variances. The Dunnett T3 method was employed for further pairwise comparisons between any two groups if the variances unequal.

Model selection
ARIMA (p, d, q) or ARMA (p, q) was combination of autoregressive (AR) and moving average (MA) models with or without differencing. AR (p) was presented in the equation below, where y t , y t−1 , y t−2 , y t−p are stationaries, and φ 0 , φ 1 , φ 2 , φ p are constants. ε t is a Gaussian white noise series with mean zero. MA (q) was presented in the equation below, where there are q lags in the moving average and θ 1 , θ 2 , θ q are parameters. ε t , ε t−1 , ε t−2 , ε t−q is a Gaussian white noise series with mean zero.
Here, backshift operator (B) was introduced as described below.
Thus, ARIMA (p, d, q) was presented briefly in the equation below, where ∇ d is the difference operator: The SARIMA model based on non-seasonal model ARIMA (p, d, q) and seasonal model ARIMA (P, D, Q)S was employed for prediction of seasonal, nonstationary time series. The model, denoted generally as ARIMA (p, d, q) × ARIMA(P, D, Q)S, was presented in the equation below, where ∇ d is the difference operator: Stationarity test Time series of AMS inpatients admissions was established to initially judge the stationarity of the series. Auto Correlation Function (ACF) graphs and Partial Auto Correlation Function (PACF) graphs were drawn. Data differentiation was determined according to the attenuation of autocorrelation coefficients and partial autocorrelation coefficients. Augmented Dickey-Fuller (ADF) unit root test was used to identify the trend and periodicity.
Model selection The appropriate value ranges for p, d, q and P, D, Q, and S were also determined by tailing and truncation of the ACF and PACF graphs.
Parameter estimation The Least Squares were adopted to estimate the parameters, and white noise test was performed for the selected ARIMA(p, d, q) × ARIMA(P, D, Q)S to identify the time series information extraction. Normalized bayesian information criteria (BIC) was used to determine the degree of overfitting.

Predictive analysis
The data was divided into a training set and a test set. Time span for training data lasted from January 2009 to December 2017 while that for the test data ranged from January to December 2018.

Data distribution
The QQ chart showed that the numbers of AMS inpatients from January to December (Fig. 1 Table 1).

Analysis of variance
The homogeneity of variance test found unequal variances (F = 3.39, p < 0.01). One-way ANOVA results showed the significant differences among groups (F = 6.67, p < 0.01). As illustrated in the line chart, AMS presented in a shape of dual-peaks and triple-dips, with higher levels in early spring and mid-to-late summer, and lower levels in late autumn and winter (Fig. 2). The Dunnett T3 test compares differences between two groups. The number of AMS inpatients reached highest in March and July, which is significantly higher than other months (p < 0.05) ( Table 2).

Time series
Time series chart of the monthly admissions of AMS inpatients was drawn. The line graph exhibits the overtime admissions with a seasonal fluctuation in a downward trend (Fig. 3A). The autocorrelation coefficient and partial autocorrelation coefficient did not follow the lag order. The number gradually decreased, showing a tailing trend (Fig. 3B&C). The ADF unit root test showed that the original series was not stationary (t = -0.39, p = 0.91), which needed first-order difference to become a stationary series (t = -6.56, p < 0.01).

Model selection and parameter estimation
The SARIMA model was selected to deal with shortterm correlation and seasonal effects. The white noise test statistics Ljung-Box Q, Stationary R-squared, and the penalty function Normalized BIC were used to determine p, d, q and P, D, Q, S. Without seasonal information in prediction models, Stationary R-squared of AR, MA, ARMA, ARIMA were less than 0.5. We found SARIMA (1,1,1) × (1,0,1)12 fully extracted information, with a non-white noise sequence (Q = 18.97, p = 0.01). The model had largest stability determination coefficient, and lowest function value and relatively better prediction ( Table 3).
The parameter estimation displays statistically significant constant and coefficient, of which constant term was 148.48, autoregressive term coefficient was 0.47, moving average term coefficient was 1.00, seasonal autoregressive term coefficient was 1.00, and seasonal moving average term coefficient was 0.93 (Table 4).

Model forecast
The forecast results show that the forecast value from January to September was generally consistent with actual ones. In 2018, we totally predicted 625 AMS hospitalized inpatients under the SARIMA (1,1,1) × (1,0,1)12 model. However, given that the forecast was negative for October (-29), November (-50), and December (-31). These data were replaced by the average value of its' respective previous two months. Finally, after adjustment, 992 AMS hospitalized inpatients were calculated, whereas the number of actual inpatients was 1093 (Fig. 4).

Discussion
AMS is a syndrome that contains a variety of non-specific symptoms. It is highly subjective and occurs frequently in people who rapidly enter high-altitude areas from plain places, primarily in the case of military training, construction in Tibet, tourist mountaineering groups. The onset of AMS is closely related to various factors such as altitude, speed, age, or gender. However, there was a lack of investigation and prediction on AMS populations' time series. Constant with previous studies, we found obvious seasonality and cyclicality of AMS inpatients [7]. As a classic time series model, SARIMA effectively captured periodic and seasonal changes, especially for regular pattern of diseases [8]. In this study, the SARIMA (1, 1, 1) (1, 0, 1) 12 model was used to fit the number of AMS hospitalized cases from 2009 to 2018. The total predicted value was 9.24% less than actual one, showing relatively precise predictive abilities.
Although plateau climate becomes warmer in March, the temperature gap between day and night was still large, and the adverse impact on the body may not be overcome [9]. In the spring of 2002, the US military carried out the "Operation Python" in the mountainous area at an altitude between 610 and 3,600 m, reporting a death toll of 8 people and 80 injuries, of which 10 suffered from AMS. The reduction rate derived from AMS accounted for 11.36%. Thus, altitude adaptation and medical training were further strengthened in US troops [10]. The high incidence in July may be closely related to seasonal tourism, where the scenery of the plateau region was favored by lots of tourists. Travellings from the mainland  to high-altitude scenic spots were found from June to August almost every year. The incidence of AMS during this period was about 64%, considering most travelers quickly entering the plateau and lack pre-adaptation to the low-oxygen environment [11]. This study has some limitations. First, we used the information of medical records of AMS hospitalized inpatients. The trend presented by this time series only reflected the current situation, but restricted to evaluate the severity of the disease. The sharp increase in AMS admissions in March and July may be related to large number of people going to the plateau [7]. Second, SARIMA's short-and medium-term forecasting showed a better performance than the long-term. The forecasted values from January to September were more consistent with the actual ones. We adjusted the forecasted values of October, November, and December by using the average value of its' previous two months, respectively. So, researchers may be cautious to the longer-term results. Third, we did not consider factors such as the altitude and speed of entering the plateau, and history of altitude sickness. To make better use model prediction, various factors that affect AMS should be incorporated within comprehensive analysis.
In summary, hospitalized inpatients with AMS showed obvious periodicity and seasonality. The SARIMA model   has sound fitting and high short-term prediction accuracy. This study may be helpful for investigating the general characteristics of AMS inpatients, facilitating the allocation of relevant medical resources.